-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema for sample #84
Conversation
Test PASSed. |
see my comment bigdatagenomics/adam#1039 (comment) |
@heuermh - if you want to keep this PR focused on schema you can assign an issue to me to update the ETL code to load and save using the new |
How are those fields populated? What do they mean? Do they map to SRA metadata fields, which per earlier conversation and the doc comments all go into |
VCF spec, in context of "5.4.10 Sample mixtures" has
but I'm not so sure that is going into one of these fields. |
The cardinality for Genomes, Mixture, and Description is 0..*, so they would need to go in array fields. Or alternatively a single array of Genome records, which would have name, mixture, and description fields. However, the doc already specifically recommends those go in |
putting in attributes is fine with me |
Although now that I've said that repeated keys in |
I'm not so convinced that these two fields are coming from this sample mixture related tag anyhow, or that they have any source in VCF currently, so maybe wait to worry about modeling I don't think these fields get populated anyhow - I am more concerned for the principle that in case sample level metadata like this appears in future and somehow are shoe-horned into these fields that it not appear in our inner-inner loop inside of |
Yes, and #83. I'll rebase to resolve conflicts in a sec |
Attributes with duplicate keys are going to get clobbered. That wasn't a major issue in the Feature record, where this also came up, but will be here. The cardinality of the VCF sample attributes and nearly all of the SRA attributes are 0..*. Is this a reasonable workaround?
or do we want to reopen discussion about a bunch of array fields? |
Test PASSed. |
Test PASSed. |
Thanks! |
See bigdatagenomics/adam#1039